Visual speech segmentation and speaker recognition for transcription of TV news

نویسنده

  • Josef Chaloupka
چکیده

This paper is about a method for visual segmentation of TV news. The TV news shows are segmented according to the visual stream from the video TV recordings in this method. Human faces are found in the single visual segments with the help of the fast algorithm for face detection. The found faces are compared with the visual GMMs, that have been trained from the video picture of the single broadcasters (anchors) from the TV news. The single visual segments, where the faces of the broadcasters have been found and recognized, have been compared with the acoustic segments from the acoustic segmentation. The speaker adapted HMMs have been used for speech recognition of these acoustic segments. The recognition rate is better for the use of this speaker-adapted HMMs compared to the use of the speaker independent HMMs. It is possible to use the methods for the speaker identification and verification from the acoustic signal in the acoustic segments. The results from the visual speaker identification will be better for smaller number of speakers and for the use of the video recordings of TV news with a lot of noise in the acoustic signal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Various Methods for Visual Speaker Identification for Automatic Continuous Speech Recognition in TV Broadcast Programs

This paper is about different methods and algorithms that were used for speaker identification from the video recordings of TV broadcast news transcription. The information from visual speaker identification were used in our complex system for automatic continuous speech recognition of TV broadcast programs because it is possible to use speaker adapted (SA) Hidden Markov Models (HMMs) if we hav...

متن کامل

Audio-Visual Speaker Recognition for Video Broadcast News

Signi cant progress has been made in the transcription of the audio stream in the broadcast news domain for both radio news and TV news (HUB4 task). Such transcripts provide an excellent means of indexing video content for search and retrieval. Speaker identi cation is an important technology in this domain both for selecting high-accuracy speaker-dependent models for transcription and as an in...

متن کامل

Robust Unsupervised Speaker Segmentation for Audio Diarization

Audio diarization Reynolds & Carrasquillo (2005) is the process of partitioning an input audio stream into homogeneous regions according to their specific audio sources. These sources can include audio type (speech, music, background noise, ect.), speaker identity and channel characteristics. With the continually increasing number of larges volumes of spoken documents including broadcasts, voic...

متن کامل

Advances in automatic transcription of Italian broadcast news

This paper presents some recent improvements in automatic transcription of Italian broadcast news obtained at ITCirst. A first preliminary activity was carried out in order to develop a suitable speech corpus for the Italian language. The resulting corpus, formed by recordings covering 30 hours of radio news, was exploited for developing a baseline system for transcription of broadcast news. Th...

متن کامل

Speech cohesion for topic segmentation of spoken contents

In this paper, we introduce the notion of speech cohesion for topic segmentation of a spoken content. The aim is to integrate speaker information and lexical information within a single cohesion value. Based on a lexical cohesion system, we propose an approach that directly integrates the speaker distribution when processing the cohesion. A potential boundary is effective if the joint distribut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006